65 Data and Ethics
65.1 Introduction
To extend our thinking around the ethics of research and data analysis in particular, there are four central areas of concern and current debate in the field of which you should be aware; privacy, security, bias, and fairness.
65.2 Data Privacy
Legal compliance is critical in data handling, including data collection, storage, and analysis. As analysts and researchers, we need to understand and adhere to relevant laws and regulations governing data privacy (e.g., GDPR in the European Union or HIPAA in the United States).
These dictate how data, particularly data which is personally identifiable, should be collected, processed, and stored, making sure that privacy rights of individuals are protected. Given the sometimes personal nature of data collected in sport, this is incredibly important to consider.
It’s also crucial to obtain informed consent from participants, which means they must be fully informed about the nature of the research, the type of data collected, how it will be used, and the potential risks involved.
Consent should be documented and obtained freely from participants. When submitting a paper to a peer-reviewed journal, for example, you’ll usually be required to confirm that appropriate ethical steps were taken, including gaining informed consent from your participants.
Anonymising your data is a powerful way to protect individual identities. Techniques like ‘pseudonymisation’ or data masking can be used to ensure that personal identifiers are removed or at least encrypted. This helps to minimise risk, but still allows for the data’s ‘analytical utility’ to be fully leveraged.
‘Privacy by Design’ is a concept that involves integrating data privacy into the design phase of sports data projects. By considering privacy at the earliest stages, we can ensure that the entire data life-cycle is appropriate, and an intrinsic part of our data analytics workflow.
65.3 Data Security
Secure data storage is key to data security in sport analytics. It may involve using secure databases, servers, or cloud services that offer robust protection against unauthorised access and potential breaches.
Encryption is a critical layer of protection for data, both during transmission and at rest. By encoding data, it becomes inaccessible to anyone without the decryption keys, securing the data from interception or theft.
Access to sensitive sporting data should be governed by strict access controls. This means only authorised personnel with a legitimate need to handle the data should be able to do so, which is usually handled through appropriate authentication and authorisation mechanisms.
When conducting research within a university context, you will be required to satisfy the ethics committee that you have appropriate steps for data storage and access in place.
To maintain and improve data security measures, regular audits should be conducted. These audits help in identifying vulnerabilities, assessing the effectiveness of current security measures, and updating protocols as needed.
Again, if working within a university setting, there will be formal mechanisms for these processes.
65.4 Bias
As noted in a previous section, we need to learn to recognise potential biases, both in our data and in our analytical processes. This can include understanding the sources of bias, whether from the data collection, algorithm design, or interpretation stages.
‘Fairness’ in data collection and selection is key to addressing bias (see below). It requires us to ensure that our data accurately represents all relevant groups and variables that could affect the outcome of our analyses.
Issues around gender and race, for example in terms of equality of access to resources, might be considered confounding factors or mediating variables in our analysis.
Our algorithms and models should be reviewed for fairness as well as statistical/theoretical integrity, with strategies implemented to detect and mitigate bias. This could involve using diverse training datasets, or designing algorithms that are sensitive to fairness considerations. This is an increasingly contentious issue, and worth bearing in mind during the analytical process.
Bias can be a ‘moving target’; hence, ongoing monitoring is essential. We should have mechanisms in place to continually assess and correct biases in sports data analytics to ensure integrity and fairness in the long term. Being part of a larger research group, or having a mentor, can be good way of addressing this.
65.5 Fairness
Promoting equity in data analysis means ensuring that outcomes do not favor one group over another without justification. This requires a balanced approach and careful interpretation of data insights.
Data should represent all relevant groups in proportion to their presence in the sport context being analysed. This can help prevent ‘skewed’ insights and can support the development of fair and more equitable data-driven decisions.
We should critically examine the assumptions underlying our models and interpretations. Unchallenged assumptions can lead to biased outcomes and we continue to be aware of, and test, these assumptions.
Maintaining transparency about our analytical methods, and being accountable for our findings, are essential in our analysis. This involves clear reporting of methodologies, assumptions, limitations, and potential conflicts of interest in any reports or outputs that result from our analysis.